Why Do We Need Classification Metrics?

Consider the following situation:

You're at the doctor's office, and the doc rolls up and introduces herself. As she explains, she's adopted a unique approach to diagnosing patients. For each patient, her prognosis is that they don't have cancer, despite whatever symptoms they may exhibit.

You're confused and somewhat concerned, but she assures you she has a track record of 99.9% accuracy.

After some quick googling, you find out cancer occurs in less than 0.1% of the total population.

Do you stay or get the heck outta there?

The doc isn't lying. She may have a 99.9% accuracy, but her diagnosis method is completely useless.

Out of the patients who did have cancer, she didn't predict a single one of them right. You should probably find a new physician.

Accuracy isn't a great measure when it comes to prediction. We need something else — another set of metrics that describe how well the doctor picks up potentially positive cases, or ignores potentially negative cases.

Agenda for today:

  1. Compute prediction probabilities, or the certainty behind a prediction
  2. Identify misclassifications — False Positives and False Negatives
  3. Build better metrics using misclassification rates
  4. Optimize models by tuning certain parameters
  5. Compare the ability of models to discriminate classes using ROC curves

Making Our Own Classifier

Imports

We'll start with the usual imports, and set some options so we get consistent results when generating random data

In [1]:
# The usual imports
import pandas as pd
import numpy as np

# Numpy random seed for consistency
np.set_printoptions(precision=4, suppress=True)
np.random.seed(123) 

# For visualizatio 
import plotly.express as px
import plotly.graph_objects as go

# To model normal distribution
from scipy.stats import norm

# To make data
from sklearn.datasets import make_blobs

Generate Data

Let's make some data! Let's start out with about 5000 observations to get us started.

In [2]:
n = 5000

Here we're using the make_blobs() function to create two classes with similar attributes.

In [3]:
centers = [[9.5, 0], [10.5, 0]] # Define the coordinates to center our blobs (x,y)
X, y = make_blobs(n_samples=n, centers=centers, cluster_std=0.4, random_state=7)
data = pd.DataFrame(X, columns=['feature1','feature2']) # Rename the feature cols
data['target'] = y.astype('str') # Convert dtype to help w/ viz

data.head()
Out[3]:
feature1 feature2 target
0 11.015775 -0.093470 1
1 10.238118 -0.119591 1
2 9.281864 0.103596 0
3 10.790475 -0.234479 1
4 9.851924 0.207997 0

How many observations are in each class?

In [4]:
print('Shape:', data.shape, '\n')
print('Class Frequencies:')
print(data.target.value_counts(normalize=True))
Shape: (5000, 3) 

Class Frequencies:
1    0.5
0    0.5
Name: target, dtype: float64

This 50-50 split means we have balance in our target class

Exploration

  • We have two features, so we can plot the data on a scatter plot and use color to identify classes.
  • We'll also throw on some marginal histograms to visualize the univariate distributions (distribution of one variable)
In [5]:
fig = px.scatter(data.tail(1000), x='feature1',y='feature2',
                 color='target', opacity=.6,
                 color_discrete_sequence = ['red','grey'],
                 category_orders={"target": ["1", "0"]},
                 labels={'target':'Target Class', 
                         'feature1': 'Feature 1', 
                         'feature2':'Feature 2'},
                 marginal_x='histogram',
                 marginal_y='histogram',
                )
fig.show('notebook')

What patterns do you see?

  • Is there one feature that separates the classes better?
  • How would you go about creating a human model to classify data points?

My strategy: Draw a boundary at x = 10, since this best seperates the classes (As seen from the histogram)

In [6]:
fig = px.scatter(data.tail(1000), x='feature1',y='feature2',
                 color='target', opacity=.6,
                 color_discrete_sequence = ['red','grey'],
                 category_orders={"target": ["1", "0"]},
                 labels={'target':'Target Class', 
                         'feature1': 'Feature 1', 
                         'feature2':'Feature 2'},
                 marginal_x='histogram', range_x = [8.25,11.75],
                )
fig.add_shape(dict(type="line", x0=10, y0=-1.5, x1=10, y1=1.5,
                   line=dict(color="black",width=2,dash="dash",)
                   ))
fig.show('notebook')

One thing to note: As we move farther from the boundary, the certainty of our prediction gets stronger

In other words, the probability that a particular observation is of class 1 increases as you move further to the right

Define a Classifier

Let's use this boundary to make our own little classifier. Not from sklearn, just on our own!

You try! Create a predict() function. In other words:

  1. Take each value from the relevant feature (X_test is a dataframe)
  2. Compare it against the boundary
  3. Make a decision of 1 or 0 based on that — make sure this is an int dtype
In [7]:
class BoundaryClassifier():
    def __init__(self):
        from scipy.stats import norm
        self.name = 'Classify observations on 1D boundary'
    
    def fit(self, X_train, y_train, x_boundary=None):
        self.boundary = x_boundary
        
    def predict(self, X_test):
        b = self.boundary
        x = X_test.feature1
        
        # You try!
        y_pred =  (x > b).astype(np.int)
        return y_pred
    
    def predict_proba(self, X_test):
        b = self.boundary
        x = X_test.feature1
        
        # Use the normal distribution to model probabilities
        y_pred_proba = ((x-b)/0.4).apply(norm.cdf)
        return y_pred_proba

As usual, we'll split up our full dataset into training and testing sets

In [8]:
from sklearn.model_selection import train_test_split

X = data.drop(columns=['target'])
y = data.target.astype('int')

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

Now, we use our little boundary classifier just as we would any sklearn model

In [9]:
clf = BoundaryClassifier() # Create the model
clf.fit(X_train, y_train, x_boundary = 10) # Fit it to the dta

y_pred = clf.predict(X_test) # Predict classes
y_pred_proba = clf.predict_proba(X_test) # Predict the probability of falling into class 1

Evaluate Performance

Let's take a look at some of the predictions by putting them side by side

In [10]:
test_results = pd.DataFrame(data = {'Actual Class':y_test, 'Predicted Class':y_pred, 'Predicted Probabilty':y_pred_proba})
test_results.sample(5)
Out[10]:
Actual Class Predicted Class Predicted Probabilty
3027 0 0 0.020143
3119 0 0 0.000047
1886 1 1 0.882670
613 0 0 0.299785
4384 1 1 0.912213

This makes sense — For most of our observations, the predicted value matched the actual.

Note that the predicted probability coresponds with the predicted class. A more extreme probability (closer to 0 or 1) means that the data point was further from the boundary, so we're pretty certain which class it belongs to.

Question: How many test observations did we get right? In other words, what's the accuracy of our model?

In [11]:
# You try!
acc = len(test_results[test_results['Actual Class'] == test_results['Predicted Class']]) / len(test_results)
acc
Out[11]:
0.9045
In [12]:
# or 
from sklearn.metrics import accuracy_score
acc = accuracy_score(y_test, y_pred)

acc.round(4)
Out[12]:
0.9045

Question: Of the actually positive observations, how many did we correctly predict as positive?

In [13]:
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
print(cm)
[[896 106]
 [ 85 913]]

Here's a quick helper function to help label the confusion matrix

In [14]:
def custom_confusion_matrix(y_test_, y_pred_proba_, alpha=0.5, output='dataframe'):
    """
    Usage:
        cm = custom_confusion_matrix(y_test, y_pred_proba, output = 'dataframe')
        tn, fp, fn, tp = custom_confusion_matrix(y_test, y_pred_proba, output = 'rates')

    Params:
        alpha: Threshold probability for classification (default = 0.5)
        output: One of 'dataframe', 'rates', or 'array'
    """
    y_pred_ = (y_pred_proba_ >  alpha).map({True:1,False:0})
    cf_mat_ = confusion_matrix(y_test_, y_pred_)
    if output == 'dataframe':
        return pd.DataFrame(cf_mat_, columns=['Predicted 0', 'Predicted 1'], index=['Actual 0', 'Actual 1'])
    elif output == 'rates':
        return cf_mat_.ravel()
    else:
        return cf_mat_
In [15]:
cm = custom_confusion_matrix(y_test, y_pred_proba, output = 'dataframe')
cm
Out[15]:
Predicted 0 Predicted 1
Actual 0 896 106
Actual 1 85 913

We can also grab the counts of each category

In [16]:
tn, fp, fn, tp = custom_confusion_matrix(y_test, y_pred_proba, output = 'rates')
tn, fp, fn, tp
Out[16]:
(896, 106, 85, 913)

How would you calculate Sensitivity (aka True Positive Rate — TPR)?

In [17]:
# You try!
tpr  = tp / (tp + fn); tpr.round(4)
Out[17]:
0.9148

What does the following figure describe?
I.e. What metric do we call the red / (gray + red) dots included here?

In [18]:
fig = px.scatter(data.tail(1000), x='feature1',y='feature2',
                 color='target', opacity=.6,
                 color_discrete_sequence = ['red','grey'],
                 category_orders={"target": ["1", "0"]},
                 labels={'target':'Target Class', 
                         'feature1': 'Feature 1', 
                         'feature2':'Feature 2'},
                 marginal_x='histogram',
                 range_x = [10, 11.75],
                )
fig.add_shape(dict(type="line", x0=10, y0=-1.5, x1=10, y1=1.5,
                   line=dict(color="black",width=4,dash="dash",)
                   ))
fig.show('notebook')

We can visualize which datapoints were misclassified in the following plot

In [19]:
data['prediction'] = (data.feature1 > 10).astype(np.int).astype('str')
data['prediction_correct'] =  (data.prediction == data.target).astype('str')

fig = px.scatter(data.iloc[y_test.index].sample(n=1000), 
                 x='feature1',y='feature2', opacity=.5,
                 color='target', symbol='prediction_correct',
                 color_discrete_sequence = ['red','grey'],
                 category_orders={'target': ['1','0'],
                                  'prediction_correct':['True','False']},
                 symbol_sequence=['circle','x'],
                 labels={'target':'Actual Class', 
                         'feature1':'Feature 1', 
                         'feature2':'Feature 2',
                         'prediction_correct':'Prediction Correct',
                        },
                 marginal_x='histogram', range_x = [8.25,11.75],
                )
fig.add_shape(dict(type="line", x0=10, y0=-1.5, x1=10, y1=1.5,
                   line=dict(color="black",width=2,dash="dash")))
label_names= ['True Positive', 'True Positive', 
              'False Negative','False Negative', 
              'True Negative','True Negative',
              'False Positive', 'False Positive']
for idx, obj in enumerate(fig.data):
    obj['name'] = label_names[idx]
fig.update_layout(legend_title_text='Classification')
fig.show('notebook')

Class Imbalance

Now let's model the situation from the doctor's office, with a heavy class imbalance.

Here, we'll use a 95-5 split, with only 5% of the target labels being of the positive class

Generate Imbalanced Data

To create this new dataset, we'll resample the old data, but only select proporotions that will generate the desired balance

In [20]:
imbal = 0.05

class0 = data[data.target == '0'].sample(frac=1-imbal) # Take 95% of the former negatives
class1 = data[data.target == '1'].sample(frac=imbal) # Take 5% of the former positives

cols_to_keep = ['feature1','feature2','target']
data_new = pd.concat([class0,class1]).sample(frac=1)[cols_to_keep] # Combine and shuffle the new data
data_new.head()
Out[20]:
feature1 feature2 target
4952 9.643196 -0.330421 1
1016 9.354889 0.326066 1
901 9.192295 0.010595 0
2503 9.166217 0.091141 0
4678 10.261723 -0.112043 0

Note that the resulting dataset has half the original observations (we threw out quite a bit of the positives)

In [21]:
print('Shape:', data_new.shape, '\n')
print('Class Frequencies:')
print(data_new.target.value_counts(normalize=True))
Shape: (2500, 3) 

Class Frequencies:
0    0.95
1    0.05
Name: target, dtype: float64

To explore the data, let's take a look at the histograms, both before and after

In [22]:
combined = pd.concat([data,data_new]) # Combine both datasets, then label the source
combined['dataset'] = np.concatenate([np.full(len(data),'50-50 Split'),np.full(len(data_new),'95-5 Split')])

fig = px.histogram(combined, x='feature1', color='target', barmode='overlay',
                   opacity = 0.6, facet_col='dataset', 
                   category_orders={"target": ["1", "0"]},
                   color_discrete_sequence = ['red','grey'],
                   labels={'target':'Target Class', 
                           'feature1': 'Feature 1',
                           'count':'Frequency',
                           'dataset':'Data Source'
                          },
                   range_x = [8.25, 11.75],
                  )
for i in [1,2]:
    # Add the boundary line to each subplot
    fig.add_shape(dict(type="line", x0=10, y0=0, x1=10, y1=150,
                       line=dict(color="black",width=2,dash="dash",)
                       ),row=1, col=i)

fig.show('notebook')

Create Model

To model our doctor's "evidence-blind" method, we can create what's known as a "dummy" classifier.

There's different dummy strategies, i.e. randomly predict 1 or 0 with 50% chance each ('50-50'), or always choose a class based on what we tell it ('hard-coded'), or the highest frequency category ('most-freq').

All of these ignore the feature data

In [23]:
class DummyClassifier():
    """
    Example Usage:
    clf = DummyClassifier(strategy='most')
    clf.fit(X_train, y_train, threshold=0.5)
    clf.predict(X_test)
    """
    def __init__(self, strategy):
        self.name = 'Dummy Classifier'
        self.strategy = strategy

    def fit(self, X_train, y_train, value_to_predict=None):
        if self.strategy == 'most-freq':
            self.value = y_train.value_counts().index.values[0]
        elif self.strategy == 'hard-coded':
            self.value = value_to_predict
            
        self.prior = 1 - y_train.value_counts(normalize=True)[value_to_predict]

        
    def predict(self, X_test):
        if self.strategy in ['most-freq','hard-coded']:
            return pd.Series(np.full(len(X_test), self.prior, dtype=np.int), index=X_test.index)
        elif self.strategy == '50-50':
            np.random.seed(123) 
            return pd.Series(np.random.uniform(0,1,len(X_test)).round(0), index=X_test.index)
        else:
            print('Wrong strategy defined!') 
    
    def predict_proba(self, X_test):
        if self.strategy == 'hard-coded':
            return pd.Series(np.full(len(X_test), self.prior, dtype=np.float), index=X_test.index)
        elif self.strategy == 'most-freq':
            return pd.Series(np.full(len(X_test), self.prior, dtype=np.float), index=X_test.index)
        elif self.strategy == '50-50':
            np.random.seed(123) 
            return pd.Series(np.random.uniform(0,1,len(X_test)), index=X_test.index)
        else:
            print('Wrong strategy defined!') 

Create new train/test splits from our new data: We're just adding a _new suffix onto everything

In [24]:
from sklearn.model_selection import train_test_split

X_new = data_new.drop(columns=['target'])
y_new = data_new.target.astype('int')

X_train_new, X_test_new, y_train_new, y_test_new = train_test_split(X_new, y_new, test_size=0.4, random_state=48)

Now let's use the dummy classifer and specifiy which value to predict (0 or negative) for each observation.

Use a _dummy suffix for the classifier and array outputs here

In [25]:
clf_dummy = DummyClassifier(strategy='hard-coded')
clf_dummy.fit(X_train_new, y_train_new, value_to_predict = 0)

y_pred_dummy = clf_dummy.predict(X_test_new)
y_pred_proba_dummy = clf_dummy.predict_proba(X_test_new)

We're gonna try another dummy model — but one that uses a strategy of predicting 50-50

Use a _rando suffix for the classifier and array outputs here

In [26]:
clf_rando = DummyClassifier(strategy='50-50')
clf_rando.fit(X_train_new, y_train_new, value_to_predict = 0)

y_pred_rando = clf_rando.predict(X_test_new)
y_pred_proba_rando = clf_rando.predict_proba(X_test_new)

Finally let's create a legitimate boundary classifier

Use a _bc suffix for the classifier and array outputs here

In [27]:
clf_bc = BoundaryClassifier() # Create the model
clf_bc.fit(X_train_new, y_train, x_boundary = 10) # Fit it to the dta

y_pred_bc = clf_bc.predict(X_test_new) # Predict classes
y_pred_proba_bc = clf_bc.predict_proba(X_test_new) # Predict the probability of falling into class 1

Evaluate Model Performance

Let's use our confusion matrix function from before to see the results

With dummy classifier

In [28]:
cm_dummy = custom_confusion_matrix(y_test_new, y_pred_proba_dummy, output = 'dataframe')
cm_dummy
Out[28]:
Predicted 0 Predicted 1
Actual 0 949 0
Actual 1 51 0
In [29]:
acc_dummy = accuracy_score(y_test_new, y_pred_dummy); acc_dummy.round(4)
Out[29]:
0.949
In [30]:
tn, fp, fn, tp = custom_confusion_matrix(y_test_new, y_pred_proba_dummy, output = 'rates')
tpr_dummy = tp / (tp + fn); tpr_dummy.round(4)
Out[30]:
0.0

With the random classifier

In [31]:
cm_rando = custom_confusion_matrix(y_test_new, y_pred_proba_rando, output = 'dataframe')
cm_rando
Out[31]:
Predicted 0 Predicted 1
Actual 0 484 465
Actual 1 24 27
In [32]:
acc_rando = accuracy_score(y_test_new, y_pred_rando); acc_rando.round(4)
Out[32]:
0.511
In [33]:
tn, fp, fn, tp = custom_confusion_matrix(y_test_new, y_pred_proba_rando, output = 'rates')
tpr_rando = tp / (tp + fn); tpr_rando.round(4)
Out[33]:
0.5294

With boundary classifier

In [34]:
cm_bc = custom_confusion_matrix(y_test_new, y_pred_proba_bc, output = 'dataframe')
cm_bc
Out[34]:
Predicted 0 Predicted 1
Actual 0 860 89
Actual 1 6 45
In [35]:
acc_bc = accuracy_score(y_test_new, y_pred_bc); acc_bc.round(4)
Out[35]:
0.905
In [36]:
tn, fp, fn, tp = custom_confusion_matrix(y_test_new, y_pred_proba_bc, output = 'rates')
tpr_bc = tp / (tp + fn); tpr_bc.round(4)
Out[36]:
0.8824

Improving Our Model

Adjust Probability Threshold

Let's take a look at our original data classifier on the balanced dataset.

In our code before, we implicitly said that "if the predicted probability of being 1 is greater than 0.5, go ahead and classify it as 1.

A more explicit function call:

In [37]:
cm = custom_confusion_matrix(y_test, y_pred_proba, output = 'dataframe', alpha=0.5)
cm
Out[37]:
Predicted 0 Predicted 1
Actual 0 896 106
Actual 1 85 913
In [38]:
tn, fp, fn, tp = custom_confusion_matrix(y_test, y_pred_proba, output = 'rates')
tpr  = tp / (tp + fn); tpr.round(4)
Out[38]:
0.9148

What if we were to decrease the probability threshold?

I.e. If we lowered the evidence needed to call a particular observation as 1, would we now be predicting more or less 1s?

In [39]:
fig = px.scatter(data.iloc[y_test.index].sample(n=1000), x='feature1',y='feature2',
                 color='target', opacity=.6,
                 color_discrete_sequence = ['red','grey'],
                 category_orders={"target": ["1", "0"]},
                 labels={'target':'Target Class', 
                         'feature1': 'Feature 1', 
                         'feature2':'Feature 2'},
                 marginal_x='histogram', range_x = [8.25,11.75],
                )

for step in np.arange(9, 11.2, 0.25):
    fig.add_trace(go.Scatter(x=[step, step],
                             y=[-1.5, 1.5],
                             name="Decision Boundary",
                             mode="lines",
                             line=dict(color="black",width=2,dash="dash"),
                             visible=False))
fig.data[8].visible = True

steps = []
for i in range(4,len(fig.data)):
    step_label = str(round(i/4,2)+8)
    step = dict(
        method="update",
        args=[{"visible": [False] * len(fig.data)}],  # layout attribute
        label = step_label)
    step["args"][0]["visible"][i] = True  # Toggle i'th trace to "visible"
    for q in range(0,4):
        step["args"][0]["visible"][q] = True  # Toggle data traces to "visible"
    steps.append(step)

sliders = [dict(active=4,
                currentvalue={"prefix": "Decision Boundary at x= "},
                pad={"t": 50, 'b':10},
                steps=steps)]

fig.update_layout(sliders=sliders)

fig.show('notebook')
In [40]:
a = 0.3
custom_confusion_matrix(y_test, y_pred_proba, output = 'dataframe', alpha=a)
Out[40]:
Predicted 0 Predicted 1
Actual 0 762 240
Actual 1 40 958

Clearly, we see more observations are being moved into the Predicted 1 column. How does that impact sensitivity? What about specificity?

In [41]:
tn, fp, fn, tp = custom_confusion_matrix(y_test, y_pred_proba, output = 'rates', alpha=a)
tpr  = tp / (tp + fn); tpr.round(4)
Out[41]:
0.9599

You try! What would happen if we were to increase the probability threshold?

In [42]:
a = 0.7
custom_confusion_matrix(y_test, y_pred_proba, output = 'dataframe', alpha=a)
Out[42]:
Predicted 0 Predicted 1
Actual 0 967 35
Actual 1 217 781
In [43]:
tn, fp, fn, tp = custom_confusion_matrix(y_test, y_pred_proba, output = 'rates', alpha=a)
tpr  = tp / (tp + fn); tpr.round(4)
Out[43]:
0.7826

Optimization

If the cost of a false negative were less than for a false positive, i.e. not calling out a positive case was really bad, which alpha (of the ones above) would you choose? Why?

ROC Curve

Create helper functions

In [44]:
def make_roc(y_test, y_pred_proba, steps=200):
    steps = 200
    alpha_vals = np.linspace(0,1,steps+1)

    tpr_lst = []
    fpr_lst = []

    for a in alpha_vals:
        tn, fp, fn, tp = custom_confusion_matrix(y_test, y_pred_proba, alpha=a, output='rates')
        tpr_lst.append(tp / (tp + fn))
        fpr_lst.append(fp / (tn + fp))
    
    roc = pd.DataFrame([alpha_vals, tpr_lst, fpr_lst], index=['alpha','tpr','fpr']).T
    
    return roc
In [56]:
def draw_roc(roc_curve, title, margins=0.02, connect_dots=True):
    fig = px.scatter(roc_curve, x='fpr', y='tpr', color='alpha', 
                     range_x=[0-margins,1+margins], range_y=[0-margins,1+margins],
                     color_continuous_scale=px.colors.sequential.Burg,
                     labels={'tpr':'True Positive Rate', 
                             'fpr':'False Positive Rate',
                             'alpha':'Alpha'
                            },
                     title = title,
                     width=540,height=500
                    )
    fig.add_shape( # add a 45-degree line
        type="line", line_color="gray", line_width=3, opacity=1, line_dash="dash",
        x0=0, x1=1, xref="x", y0=0, y1=1, yref="y")
    fig.update_yaxes(tick0=0, dtick=.2)
    fig.update_xaxes(tick0=0, dtick=.2)
    fig.update_xaxes(showline=True, linewidth=2, linecolor='gray')
    fig.update_yaxes(showline=True, linewidth=2, linecolor='gray')

    if connect_dots:
        fig.add_trace(go.Scatter(x=roc_curve.fpr,
                                 y=roc_curve.tpr,
                                 mode="lines",
                                 line=dict(color="gray",width=1),
                                 showlegend=False
                                ))

    return fig

Run on BoundaryClassifier

In [57]:
roc_bc = make_roc(y_test_new, y_pred_proba_bc)
roc_bc.head()
Out[57]:
alpha tpr fpr
0 0.000 1.0 1.000000
1 0.005 1.0 0.909378
2 0.010 1.0 0.865121
3 0.015 1.0 0.827187
4 0.020 1.0 0.791359
In [58]:
fig_bc = draw_roc(roc_bc, title='ROC of Boundary Classifier on 95-5 Dataset', margins=0.03)
fig_bc.show('notebook')

Run on Rando Classifier

In [59]:
roc_rando = make_roc(y_test_new, y_pred_proba_rando)

fig_rando = draw_roc(roc_rando, title='ROC of 50% Rando Classifier on 95-5 Dataset', margins=0.03)
fig_rando.show('notebook')

As you can tell, the ROC is what allows us to really differentiate crappy models from good ones.

What does the 45-degree line tell us?

Run on Dummy Classifier

In [60]:
roc_dummy = make_roc(y_test_new, y_pred_proba_dummy,steps=2)

fig_dummy = draw_roc(roc_dummy, title='ROC of Dummy Classifier on 95-5 Dataset', margins=0.03)
fig_dummy.show('notebook')

BoundaryClassifier on the 50-50 dataset

In [61]:
roc = make_roc(y_test, y_pred_proba)

fig = draw_roc(roc, title='ROC of BoundaryClassifier on 50-50 Dataset', margins=0.03)
fig.show('notebook')

AUC ROC

In a single metric, we can summarize the curve by the area under curve, or AUC (of the ROC curve)

  • The higher the line goes, the more area it draws out, the closer the value it is to 1 (this is desirable)
  • The lower the line goes, the lower it gets. The AUC of the 45-degree line is 0.5 (very bad)

We'll use the sklearn.metrics.auc function for this

In [51]:
from sklearn.metrics import auc
auc(roc.fpr, roc.tpr)
Out[51]:
0.9653653614614459
In [52]:
print('ROC AUC of BoundaryClassifier on 50-50 Split dataset:', auc(roc.fpr, roc.tpr).round(4))
ROC AUC of BoundaryClassifier on 50-50 Split dataset: 0.9654
In [53]:
print('ROC AUC of BoundaryClassifier on 95-5 Split dataset:', auc(roc_bc.fpr, roc_bc.tpr).round(4))
ROC AUC of BoundaryClassifier on 95-5 Split dataset: 0.9502
In [54]:
print('ROC AUC of RandoClassifier on 95-5 Split dataset:', auc(roc_rando.fpr, roc_rando.tpr).round(4))
ROC AUC of RandoClassifier on 95-5 Split dataset: 0.5163
In [55]:
print('ROC AUC of DummyClassifier on 95-5 Split dataset:', auc(roc_dummy.fpr, roc_dummy.tpr).round(4))
ROC AUC of DummyClassifier on 95-5 Split dataset: 0.5

You Try!

Try creating either a KNN or Decision Tree classifier on the balanced (original dataset).

You can use the existing train test splits, but make sure to:

  1. Fit / Train the model
  2. Generation predictions and probabilities
  3. Show confusion matrix
  4. Show ROC
In [ ]: